feat: Support remote filesystem seeds#765
Conversation
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
|
Fern preview: https://nvidia-preview-pr-765.docs.buildwithfern.com/nemo/datadesigner
|
Greptile SummaryThis PR introduces a
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py | Introduces FileSystemProvider protocol + LocalFileSystemProvider default, defers existence checks to read time, adds SeedReaderConfigError, and narrows AgentRolloutSeedReader to require a concrete Path |
| packages/data-designer-config/src/data_designer/config/seed_source.py | Removes load-time directory existence validation and path resolution from FileSystemSeedSource; runtime_path now returns the raw path string for resolution by the provider at read time |
| packages/data-designer-engine/src/data_designer/engine/compiler.py | Wraps get_column_names() in a SeedReaderConfigError catch that converts to InvalidConfigError, propagating filesystem validation failures cleanly to the compile phase |
| packages/data-designer-config/tests/config/test_seed_source.py | Updated and expanded tests to cover deferred validation, raw runtime_path preservation, AgentRollout fallback defaults, and plugin subclass inheritance of runtime_path |
| packages/data-designer-engine/tests/engine/resources/test_seed_reader.py | Updates CWD-change test to assert read-time resolution (beta.txt from later_seed_dir rather than alpha.txt from initial_seed_dir), and adds a test for missing-root error propagation |
| packages/data-designer-engine/tests/engine/test_compiler.py | Adds test verifying SeedReaderConfigError raised from get_column_names is re-raised as InvalidConfigError with the original as cause |
| fern/versions/latest/pages/concepts/seed-datasets.mdx | Documentation updated to note that relative local paths are resolved by the active filesystem provider at validation/read time, not at config construction |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Caller
participant FileSystemSeedReader
participant FileSystemProvider
participant SeedReaderFileSystemContext
Caller->>FileSystemSeedReader: get_column_names()
FileSystemSeedReader->>FileSystemSeedReader: _get_filesystem_context()
FileSystemSeedReader->>FileSystemProvider: ensure_root_exists(runtime_path)
alt path does not exist
FileSystemProvider-->>FileSystemSeedReader: raise SeedReaderConfigError
FileSystemSeedReader-->>Caller: raise SeedReaderConfigError
end
FileSystemSeedReader->>FileSystemSeedReader: create_filesystem_context(runtime_path)
FileSystemSeedReader->>FileSystemProvider: create_context(runtime_path)
FileSystemProvider-->>FileSystemSeedReader: SeedReaderFileSystemContext(fs, root_path)
FileSystemSeedReader->>SeedReaderFileSystemContext: build_manifest(context)
SeedReaderFileSystemContext-->>FileSystemSeedReader: manifest rows
FileSystemSeedReader-->>Caller: column names
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Caller
participant FileSystemSeedReader
participant FileSystemProvider
participant SeedReaderFileSystemContext
Caller->>FileSystemSeedReader: get_column_names()
FileSystemSeedReader->>FileSystemSeedReader: _get_filesystem_context()
FileSystemSeedReader->>FileSystemProvider: ensure_root_exists(runtime_path)
alt path does not exist
FileSystemProvider-->>FileSystemSeedReader: raise SeedReaderConfigError
FileSystemSeedReader-->>Caller: raise SeedReaderConfigError
end
FileSystemSeedReader->>FileSystemSeedReader: create_filesystem_context(runtime_path)
FileSystemSeedReader->>FileSystemProvider: create_context(runtime_path)
FileSystemProvider-->>FileSystemSeedReader: SeedReaderFileSystemContext(fs, root_path)
FileSystemSeedReader->>SeedReaderFileSystemContext: build_manifest(context)
SeedReaderFileSystemContext-->>FileSystemSeedReader: manifest rows
FileSystemSeedReader-->>Caller: column names
Reviews (3): Last reviewed commit: "Fix stale docstring" | Re-trigger Greptile
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
📋 Summary
Adds support for injecting fsspec filesystems into
DirectorySeedReaderandFileContentsSeedReaderso that they can be used in non-local contexts🔗 Related Issue
Implements this plan
🔄 Changes
FileSystemProvider(Protocol)and a default implementationLocalFileSystemProvider, adding a seam where previously the local filesystem was effectively hardcodedFileSystemSeedSourceand its subclasses to not validate dir/file existence upon config object creation, instead deferring that check to validation/read timesFileSystemSeedSourceand its subclasses🧪 Testing
make testpasses✅ Checklist